Evan Collins (evan.collins@yale.edu)
Kelly Farley (kelly.farley@yale.edu)
Ken Stier (ken.stier@yale.edu)
Raw dataset: COVID-19 infection and death statistics from U.S. counties (sourced from NYT), combined with economic, education, and population data (sourced from various government agencies) and also survey responses about mask-wearing frequencies (sourced from NYT). 3141 complete observations on 19 metric variables and 6 categorical variables. To avoid any outliers due to population size differences between counties, all variables are scaled as a percentage of population. Variable descriptions can be found here.
Data of relevance for this pset:
Categorical Predictor 1 (rural_urban_code): The Rural-Urban Codes are numbered 1-9 according to descriptions provided by the USDA. We will regroup codes 1 through 9 as into three groups: (1) “Urban” for codes 1-3, (2) “Suburban” for codes 4-6, and (3) “Rural” for codes 7-9.
Categorical Predictor 2 (region): Each county is associated with a state name, which we will group into regions as defined by the U.S. Census Bureau: Northeast, Midwest, South, and West.
3 Continuous Response Variables: A NYT survey about masking behaviors asked people whether they wear a mask in public when they expect to be within 6 feet of another person and calculated the responses for each county for never, rarely, sometimes, frequently, and always mask. We choose to look at the extremes and will examine 3 continuous response variables: never mask, sometimes mask, and always mask.
raw <- readr::read_csv("https://evancollins.com/covid_and_demographics.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_character(),
## County_Name = col_character(),
## State_Name = col_character(),
## FIPS = col_character()
## )
## See spec(...) for full column specifications.
# create categorical variables: rural-urban code (3 levels), region (4 variables)
raw <-
raw %>%
mutate(region = case_when(
State_Name %in% c("Washington", "Oregon", "California", "Nevada", "Idaho", "Montana", "Utah", "Arizona", "Wyoming", "Colorado", "New Mexico", "Alaska", "Hawaii") ~ "West",
State_Name %in% c("North Dakota", "South Dakota", "Nebraska", "Kansas", "Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Michigan", "Indiana", "Ohio") ~ "Midwest",
State_Name %in% c("Texas", "Oklahoma", "Arkansas", "Louisiana", "Mississippi", "Tennessee", "Kentucky", "Alabama", "Georgia", "Florida", "South Carolina", "North Carolina", "Virginia", "West Virginia", "District of Columbia", "Delaware", "Maryland") ~ "South",
State_Name %in% c("Pennsylvania", "New Jersey", "Connecticut", "Rhode Island", "Massachusetts", "New Hampshire", "Vermont", "Maine", "New York") ~ "Northeast"),
rural_urban_code = case_when(
Rural_Urban_Code_2013 %in% c(1, 2, 3) ~ "Urban",
Rural_Urban_Code_2013 %in% c(4, 5, 6) ~ "Suburban",
Rural_Urban_Code_2013 %in% c(7, 8, 9) ~ "Rural")
)
raw$rural_urban_code <- as.factor(raw$rural_urban_code) # Rural is reference
interaction.plot(raw$rural_urban_code, raw$region, raw$Never_Wear_Mask_Survey,
lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region",
xlab = "Rural-Urban Setting", main = "Interaction Plot for Never Mask")
interaction.plot(raw$rural_urban_code, raw$region, raw$Sometimes_Wear_Mask_Survey,
lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region",
xlab = "Rural-Urban Setting", main = "Interaction Plot for Sometimes Mask")
interaction.plot(raw$rural_urban_code, raw$region, raw$Always_Wear_Mask_Survey,
lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region",
xlab = "Rural-Urban Setting", main = "Interaction Plot for Always Mask")
There does appear to be interaction between the rural-urban setting and the region, as evidenced by the non-parallel lines on the interaction plots for never masking, sometimes masking, and always masking. In particular, the West region seems to have behaviors that most contradict those of other regions, particularly in the Western suburbs.
Univariate:
Anova(lm(Never_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
##
## Response: Never_Wear_Mask_Survey
## Sum Sq Df F value Pr(>F)
## (Intercept) 7.5571 1 2849.276 < 2.2e-16 ***
## region 0.5692 3 71.535 < 2.2e-16 ***
## rural_urban_code 0.6754 2 127.318 < 2.2e-16 ***
## region:rural_urban_code 0.1324 6 8.321 5.75e-09 ***
## Residuals 8.2990 3129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(Sometimes_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
##
## Response: Sometimes_Wear_Mask_Survey
## Sum Sq Df F value Pr(>F)
## (Intercept) 10.8499 1 3966.9689 < 2e-16 ***
## region 0.3800 3 46.3072 < 2e-16 ***
## rural_urban_code 0.2320 2 42.4113 < 2e-16 ***
## region:rural_urban_code 0.0294 6 1.7926 0.09662 .
## Residuals 8.5580 3129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(Always_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
##
## Response: Always_Wear_Mask_Survey
## Sum Sq Df F value Pr(>F)
## (Intercept) 59.119 1 4181.798 < 2.2e-16 ***
## region 4.959 3 116.919 < 2.2e-16 ***
## rural_urban_code 2.979 2 105.361 < 2.2e-16 ***
## region:rural_urban_code 0.934 6 11.017 3.485e-12 ***
## Residuals 44.236 3129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Univariate:
The region, rural-urban code, and their interaction are significant in all 3 univariate models for masking behaviors, with p-values << alpha = 0.05. Based on coefficients, region is a more important predictor of sometimes masking, then always masking, then never masking; rural-urban code is a more important predictor of never masking, then sometimes masking, then always masking. but overall a less important predictor than region. Their interaction is most important in never masking, then always masking, then sometimes masking.
Multivariate:
Overall, there are differences between region and rural-urban codes (all multivariate statistics are significant). All of the multivariate tests suggest there is an interaction effect between region and rural-urban code.
multimod <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ region*rural_urban_code, data = raw)
summary(Anova(multimod), univariate=T)
##
## Type II MANOVA Tests:
##
## Sum of squares and products for error:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 8.298969 1.335098
## Sometimes_Wear_Mask_Survey 1.335098 8.558012
## Always_Wear_Mask_Survey -11.042686 -11.266079
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -11.04269
## Sometimes_Wear_Mask_Survey -11.26608
## Always_Wear_Mask_Survey 44.23570
##
## ------------------------------------------
##
## Term: region
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.9675336 1.031265
## Sometimes_Wear_Mask_Survey 1.0312653 1.172121
## Always_Wear_Mask_Survey -3.7941207 -4.016547
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -3.794121
## Sometimes_Wear_Mask_Survey -4.016547
## Always_Wear_Mask_Survey 15.197097
##
## Multivariate Tests: region
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 3 0.2828710 108.5832 9 9387.000 < 2.22e-16 ***
## Wilks 3 0.7240752 119.9592 9 7610.447 < 2.22e-16 ***
## Hotelling-Lawley 3 0.3714955 129.0190 9 9377.000 < 2.22e-16 ***
## Roy 3 0.3438340 358.6189 3 3129.000 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: rural_urban_code
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.9798586 0.7168383
## Sometimes_Wear_Mask_Survey 0.7168383 0.5430791
## Always_Wear_Mask_Survey -2.7350157 -2.0139890
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -2.735016
## Sometimes_Wear_Mask_Survey -2.013989
## Always_Wear_Mask_Survey 7.643303
##
## Multivariate Tests: rural_urban_code
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 2 0.1625339 92.22958 6 6256 < 2.22e-16 ***
## Wilks 2 0.8378157 96.42712 6 6254 < 2.22e-16 ***
## Hotelling-Lawley 2 0.1931626 100.63771 6 6252 < 2.22e-16 ***
## Roy 2 0.1909774 199.12574 3 3128 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: region:rural_urban_code
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.13241775 0.04391835
## Sometimes_Wear_Mask_Survey 0.04391835 0.02941714
## Always_Wear_Mask_Survey -0.21615727 -0.13410491
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.2161573
## Sometimes_Wear_Mask_Survey -0.1341049
## Always_Wear_Mask_Survey 0.9344807
##
## Multivariate Tests: region:rural_urban_code
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 6 0.0418657 7.380650 18 9387.000 < 2.22e-16 ***
## Wilks 6 0.9585755 7.405305 18 8844.977 < 2.22e-16 ***
## Hotelling-Lawley 6 0.0427549 7.424303 18 9377.000 < 2.22e-16 ***
## Roy 6 0.0254642 13.279579 6 3129.000 6.5347e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type II Sums of Squares
## df Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region 3 0.96753 1.172121
## rural_urban_code 2 0.97986 0.543079
## region:rural_urban_code 6 0.13242 0.029417
## residuals 3129 8.29897 8.558012
## Always_Wear_Mask_Survey
## region 15.19710
## rural_urban_code 7.64330
## region:rural_urban_code 0.93448
## residuals 44.23570
##
## F-tests
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region 121.60 214.28
## rural_urban_code 123.15 99.28
## region:rural_urban_code 16.64 5.38
## Always_Wear_Mask_Survey
## region 179.16
## rural_urban_code 90.11
## region:rural_urban_code 11.02
##
## p-values
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region < 2.22e-16 < 2.22e-16
## rural_urban_code < 2.22e-16 < 2.22e-16
## region:rural_urban_code 1.0011e-10 0.0046608
## Always_Wear_Mask_Survey
## region < 2.22e-16
## rural_urban_code < 2.22e-16
## region:rural_urban_code 3.4855e-12
Let’s do the following comparisons:
Multivariate - Rural vs. Suburban
Multivariate - Rural vs. Urban
Univariate - Rural vs. Urban
Multivariate - Interaction between Rural and Urban between regions Northeast and South
Univariate - Interaction between Rural and Urban between regions Northeast and South
First, let’s make a variable catcomb that combines both categorical variables.
raw$catcomb <- paste(raw$rural_urban_code, raw$region, sep = "")
table(raw$catcomb)
##
## RuralMidwest RuralNortheast RuralSouth RuralWest
## 443 29 406 199
## SuburbanMidwest SuburbanNortheast SuburbanSouth SuburbanWest
## 310 58 424 107
## UrbanMidwest UrbanNortheast UrbanSouth UrbanWest
## 302 130 592 141
options(contrasts = c("contr.treatment", "contr.poly"))
# Make catcomb a factor
raw$catcomb <- as.factor(raw$catcomb) # RuralMidwest is reference level
# Multivariate - Fit one way MANOVA model
multimod2 <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ catcomb, data = raw)
# Univariate - Fit one way ANOVA model just for Never_Wear_Mask_Survey
multimodNever <- lm(Never_Wear_Mask_Survey ~ catcomb, data = raw)
contrasts(raw$catcomb)
## RuralNortheast RuralSouth RuralWest SuburbanMidwest
## RuralMidwest 0 0 0 0
## RuralNortheast 1 0 0 0
## RuralSouth 0 1 0 0
## RuralWest 0 0 1 0
## SuburbanMidwest 0 0 0 1
## SuburbanNortheast 0 0 0 0
## SuburbanSouth 0 0 0 0
## SuburbanWest 0 0 0 0
## UrbanMidwest 0 0 0 0
## UrbanNortheast 0 0 0 0
## UrbanSouth 0 0 0 0
## UrbanWest 0 0 0 0
## SuburbanNortheast SuburbanSouth SuburbanWest UrbanMidwest
## RuralMidwest 0 0 0 0
## RuralNortheast 0 0 0 0
## RuralSouth 0 0 0 0
## RuralWest 0 0 0 0
## SuburbanMidwest 0 0 0 0
## SuburbanNortheast 1 0 0 0
## SuburbanSouth 0 1 0 0
## SuburbanWest 0 0 1 0
## UrbanMidwest 0 0 0 1
## UrbanNortheast 0 0 0 0
## UrbanSouth 0 0 0 0
## UrbanWest 0 0 0 0
## UrbanNortheast UrbanSouth UrbanWest
## RuralMidwest 0 0 0
## RuralNortheast 0 0 0
## RuralSouth 0 0 0
## RuralWest 0 0 0
## SuburbanMidwest 0 0 0
## SuburbanNortheast 0 0 0
## SuburbanSouth 0 0 0
## SuburbanWest 0 0 0
## UrbanMidwest 0 0 0
## UrbanNortheast 1 0 0
## UrbanSouth 0 1 0
## UrbanWest 0 0 1
levels(raw$catcomb)
## [1] "RuralMidwest" "RuralNortheast" "RuralSouth"
## [4] "RuralWest" "SuburbanMidwest" "SuburbanNortheast"
## [7] "SuburbanSouth" "SuburbanWest" "UrbanMidwest"
## [10] "UrbanNortheast" "UrbanSouth" "UrbanWest"
# Get multivariate contrast for Rural vs. Suburban
linearHypothesis(multimod2, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombSuburbanMidwest - catcombSuburbanNortheast - catcombSuburbanSouth - catcombSuburbanWest = 0")
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.06956023 0.020719275
## Sometimes_Wear_Mask_Survey 0.02071928 0.006171462
## Always_Wear_Mask_Survey -0.20126412 -0.059948715
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.20126412
## Sometimes_Wear_Mask_Survey -0.05994872
## Always_Wear_Mask_Survey 0.58233336
##
## Sum of squares and products for error:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 8.298969 1.335098
## Sometimes_Wear_Mask_Survey 1.335098 8.558012
## Always_Wear_Mask_Survey -11.042686 -11.266079
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -11.04269
## Sometimes_Wear_Mask_Survey -11.26608
## Always_Wear_Mask_Survey 44.23570
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0156343 16.55496 3 3127 1.1358e-10 ***
## Wilks 1 0.9843657 16.55496 3 3127 1.1358e-10 ***
## Hotelling-Lawley 1 0.0158826 16.55496 3 3127 1.1358e-10 ***
## Roy 1 0.0158826 16.55496 3 3127 1.1358e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the multivariate tests between Rural and Suburban shown above, we can see that the Pillai, Wilks, Hotelling-Lawley, and Roy values all have p=1.1358e-10 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Suburban are significantly different.
# Get multivariate contrast for Rural vs. Urban
linearHypothesis(multimod2, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0")
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.4017247 0.2808438
## Sometimes_Wear_Mask_Survey 0.2808438 0.1963366
## Always_Wear_Mask_Survey -1.2512379 -0.8747344
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -1.2512379
## Sometimes_Wear_Mask_Survey -0.8747344
## Always_Wear_Mask_Survey 3.8971867
##
## Sum of squares and products for error:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 8.298969 1.335098
## Sometimes_Wear_Mask_Survey 1.335098 8.558012
## Always_Wear_Mask_Survey -11.042686 -11.266079
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -11.04269
## Sometimes_Wear_Mask_Survey -11.26608
## Always_Wear_Mask_Survey 44.23570
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0840648 95.66564 3 3127 < 2.22e-16 ***
## Wilks 1 0.9159352 95.66564 3 3127 < 2.22e-16 ***
## Hotelling-Lawley 1 0.0917803 95.66564 3 3127 < 2.22e-16 ***
## Roy 1 0.0917803 95.66564 3 3127 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the multivariate tests between Rural and Urban shown above, we can see that the Pillai, Wilks, Hotelling-Lawley, and Roy values all have p=2.22e-16 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Urban are significantly different.
# Get univariate contrast for Rural vs. Urban
linearHypothesis(multimodNever, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0")
## Linear hypothesis test
##
## Hypothesis:
## catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0
##
## Model 1: restricted model
## Model 2: Never_Wear_Mask_Survey ~ catcomb
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3130 8.7007
## 2 3129 8.2990 1 0.40172 151.46 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the univariate F-test between Rural and Urban shown above, we can see that the p<2.2e-16 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Urban are significantly different for Never_Wear_Mask_Survey.
#Get multivariate contrast for Northeast,South and Rural,Urban interaction
linearHypothesis(multimod2, "catcombRuralNortheast - catcombUrbanNortheast - catcombRuralSouth + catcombUrbanSouth = 0")
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.0003439765 0.001177750
## Sometimes_Wear_Mask_Survey 0.0011777504 0.004032532
## Always_Wear_Mask_Survey 0.0004625952 0.001583892
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.0004625952
## Sometimes_Wear_Mask_Survey 0.0015838923
## Always_Wear_Mask_Survey 0.0006221191
##
## Sum of squares and products for error:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 8.298969 1.335098
## Sometimes_Wear_Mask_Survey 1.335098 8.558012
## Always_Wear_Mask_Survey -11.042686 -11.266079
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -11.04269
## Sometimes_Wear_Mask_Survey -11.26608
## Always_Wear_Mask_Survey 44.23570
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0012273 1.280839 3 3127 0.27918
## Wilks 1 0.9987727 1.280839 3 3127 0.27918
## Hotelling-Lawley 1 0.0012288 1.280839 3 3127 0.27918
## Roy 1 0.0012288 1.280839 3 3127 0.27918
For the multivariate test above evaluating the interaction between Rural and Urban between regions Northeast and South, the difference is not shown to be significantly different, as the p value (0.27918) of the multivariate tests is greater than 0.05.
#Get univariate contrast for Northeast,South and Rural,Urban interaction
linearHypothesis(multimodNever, "catcombRuralNortheast - catcombUrbanNortheast - catcombRuralSouth + catcombUrbanSouth = 0")
## Linear hypothesis test
##
## Hypothesis:
## catcombRuralNortheast - catcombRuralSouth - catcombUrbanNortheast + catcombUrbanSouth = 0
##
## Model 1: restricted model
## Model 2: Never_Wear_Mask_Survey ~ catcomb
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 3130 8.2993
## 2 3129 8.2990 1 0.00034398 0.1297 0.7188
For the univariate test above evaluating the interaction between Rural and Urban between regions Northeast and South, the difference in Never_Mask_Survey is not shown to be significantly different, as the p value (0.7188) is greater than 0.05.
Let’s add two other continuous variables as covariates to the model and fit as a multiple-response linear model. We will include `andPercent_Adults_Bachelors_or_Higher` as covariates.
Let’s first plot the relationships between the covariates and the three response variables.
names(raw)
## [1] "X1"
## [2] "County_Name"
## [3] "State_Name"
## [4] "FIPS"
## [5] "Never_Wear_Mask_Survey"
## [6] "Rarely_Wear_Mask_Survey"
## [7] "Sometimes_Wear_Mask_Survey"
## [8] "Frequently_Wear_Mask_Survey"
## [9] "Always_Wear_Mask_Survey"
## [10] "Unemployment_Rate_2019"
## [11] "Median_Household_Income_2019"
## [12] "Median_Household_Income_Percent_of_State_Total_2019"
## [13] "Percent_Poverty_2019"
## [14] "Percent_Adults_Less_Than_HS"
## [15] "Percent_Adults_Bachelors_or_Higher"
## [16] "Population_2019"
## [17] "Net_Migration_Rate_2019"
## [18] "Death_Rate_2019"
## [19] "Birth_Rate_2019"
## [20] "Rural_Urban_Code_2013"
## [21] "Economic_Typology_2015"
## [22] "Covid_Confirmed_Cases_as_pct"
## [23] "Covid_Deaths_as_pct"
## [24] "Covid_New_Cases_as_pct"
## [25] "Civilian_Labor_Force_2019_as_pct"
## [26] "region"
## [27] "rural_urban_code"
## [28] "catcomb"
# For Median_Household_Income_2019
ggplot(raw, aes(x=Never_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Never_Wear_Mask_Survey vs. Median Household Income")
ggplot(raw, aes(x=Sometimes_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Sometimes_Wear_Mask_Survey vs. Median Household Income")
ggplot(raw, aes(x=Always_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Always_Wear_Mask_Survey vs. Median Household Income")
# For Percent_Adults_Bachelors_or_Higher
ggplot(raw, aes(x=Never_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Never_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")
ggplot(raw, aes(x=Sometimes_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Sometimes_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")
ggplot(raw, aes(x=Always_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Always_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")
options(contrasts = c("contr.sum", "contr.poly"))
multimod3 <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ region*rural_urban_code + Median_Household_Income_2019 + Percent_Adults_Bachelors_or_Higher, data = raw)
#Multivariate results and univariate results with with type 3 Sum of squares
summary(Anova(multimod3, type = 3), univariate = T)
##
## Type III MANOVA Tests:
##
## Sum of squares and products for error:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 8.024013 1.052061
## Sometimes_Wear_Mask_Survey 1.052061 8.248288
## Always_Wear_Mask_Survey -10.274843 -10.429628
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -10.27484
## Sometimes_Wear_Mask_Survey -10.42963
## Always_Wear_Mask_Survey 41.97602
##
## ------------------------------------------
##
## Term: (Intercept)
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 1.465860 1.927422
## Sometimes_Wear_Mask_Survey 1.927422 2.534319
## Always_Wear_Mask_Survey 6.249751 8.217641
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey 6.249751
## Sometimes_Wear_Mask_Survey 8.217641
## Always_Wear_Mask_Survey 26.646065
##
## Multivariate Tests: (Intercept)
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.805661 4318.38 3 3125 < 2.22e-16 ***
## Wilks 1 0.194339 4318.38 3 3125 < 2.22e-16 ***
## Hotelling-Lawley 1 4.145645 4318.38 3 3125 < 2.22e-16 ***
## Roy 1 4.145645 4318.38 3 3125 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: region
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.8891676 0.8878287
## Sometimes_Wear_Mask_Survey 0.8878287 0.9500496
## Always_Wear_Mask_Survey -3.4477626 -3.4579297
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -3.447763
## Sometimes_Wear_Mask_Survey -3.457930
## Always_Wear_Mask_Survey 13.494766
##
## Multivariate Tests: region
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 3 0.2577033 97.9518 9 9381.000 < 2.22e-16 ***
## Wilks 3 0.7457420 108.2628 9 7605.579 < 2.22e-16 ***
## Hotelling-Lawley 3 0.3363322 116.7322 9 9371.000 < 2.22e-16 ***
## Roy 3 0.3220783 335.7130 3 3127.000 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: rural_urban_code
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.1804586 0.10959621
## Sometimes_Wear_Mask_Survey 0.1095962 0.07673975
## Always_Wear_Mask_Survey -0.6026318 -0.37800526
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.6026318
## Sometimes_Wear_Mask_Survey -0.3780053
## Always_Wear_Mask_Survey 2.0266368
##
## Multivariate Tests: rural_urban_code
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 2 0.0489154 26.12387 6 6252 < 2.22e-16 ***
## Wilks 2 0.9511373 26.42162 6 6250 < 2.22e-16 ***
## Hotelling-Lawley 2 0.0513174 26.71925 6 6248 < 2.22e-16 ***
## Roy 2 0.0502123 52.32121 3 3126 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: Median_Household_Income_2019
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.03812532 0.014684402
## Sometimes_Wear_Mask_Survey 0.01468440 0.005655864
## Always_Wear_Mask_Survey -0.04490704 -0.017296456
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.04490704
## Sometimes_Wear_Mask_Survey -0.01729646
## Always_Wear_Mask_Survey 0.05289508
##
## Multivariate Tests: Median_Household_Income_2019
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0053326 5.584584 3 3125 0.00081009 ***
## Wilks 1 0.9946674 5.584584 3 3125 0.00081009 ***
## Hotelling-Lawley 1 0.0053612 5.584584 3 3125 0.00081009 ***
## Roy 1 0.0053612 5.584584 3 3125 0.00081009 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: Percent_Adults_Bachelors_or_Higher
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.07095388 0.1041369
## Sometimes_Wear_Mask_Survey 0.10413686 0.1528385
## Always_Wear_Mask_Survey -0.27609096 -0.4052103
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.2760910
## Sometimes_Wear_Mask_Survey -0.4052103
## Always_Wear_Mask_Survey 1.0743065
##
## Multivariate Tests: Percent_Adults_Bachelors_or_Higher
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0283637 30.40804 3 3125 < 2.22e-16 ***
## Wilks 1 0.9716363 30.40804 3 3125 < 2.22e-16 ***
## Hotelling-Lawley 1 0.0291917 30.40804 3 3125 < 2.22e-16 ***
## Roy 1 0.0291917 30.40804 3 3125 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: region:rural_urban_code
##
## Sum of squares and products for the hypothesis:
## Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey 0.15414500 0.05640962
## Sometimes_Wear_Mask_Survey 0.05640962 0.03422600
## Always_Wear_Mask_Survey -0.25568153 -0.15534870
## Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey -0.2556815
## Sometimes_Wear_Mask_Survey -0.1553487
## Always_Wear_Mask_Survey 1.0127658
##
## Multivariate Tests: region:rural_urban_code
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 6 0.0460407 8.122949 18 9381.00 < 2.22e-16 ***
## Wilks 6 0.9544997 8.152075 18 8839.32 < 2.22e-16 ***
## Hotelling-Lawley 6 0.0471036 8.174216 18 9371.00 < 2.22e-16 ***
## Roy 6 0.0266350 13.881261 6 3127.00 1.2243e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type III Sums of Squares
## df Never_Wear_Mask_Survey
## (Intercept) 1 1.465860
## region 3 0.889168
## rural_urban_code 2 0.180459
## Median_Household_Income_2019 1 0.038125
## Percent_Adults_Bachelors_or_Higher 1 0.070954
## region:rural_urban_code 6 0.154145
## residuals 3127 8.024013
## Sometimes_Wear_Mask_Survey
## (Intercept) 2.5343190
## region 0.9500496
## rural_urban_code 0.0767398
## Median_Household_Income_2019 0.0056559
## Percent_Adults_Bachelors_or_Higher 0.1528385
## region:rural_urban_code 0.0342260
## residuals 8.2482885
## Always_Wear_Mask_Survey
## (Intercept) 26.646065
## region 13.494766
## rural_urban_code 2.026637
## Median_Household_Income_2019 0.052895
## Percent_Adults_Bachelors_or_Higher 1.074307
## region:rural_urban_code 1.012766
## residuals 41.976018
##
## F-tests
## Never_Wear_Mask_Survey
## (Intercept) 571.25
## region 346.51
## rural_urban_code 70.33
## Median_Household_Income_2019 14.86
## Percent_Adults_Bachelors_or_Higher 27.65
## region:rural_urban_code 60.07
## Sometimes_Wear_Mask_Survey
## (Intercept) 320.26
## region 360.17
## rural_urban_code 9.70
## Median_Household_Income_2019 2.14
## Percent_Adults_Bachelors_or_Higher 19.31
## region:rural_urban_code 12.98
## Always_Wear_Mask_Survey
## (Intercept) 992.50
## region 167.55
## rural_urban_code 75.49
## Median_Household_Income_2019 0.66
## Percent_Adults_Bachelors_or_Higher 40.02
## region:rural_urban_code 12.57
##
## p-values
## Never_Wear_Mask_Survey
## (Intercept) < 2.22e-16
## region < 2.22e-16
## rural_urban_code < 2.22e-16
## Median_Household_Income_2019 0.00011827
## Percent_Adults_Bachelors_or_Higher 1.5506e-07
## region:rural_urban_code 1.2280e-14
## Sometimes_Wear_Mask_Survey
## (Intercept) < 2.22e-16
## region < 2.22e-16
## rural_urban_code 2.2800e-06
## Median_Household_Income_2019 0.14321118
## Percent_Adults_Bachelors_or_Higher 2.0891e-12
## region:rural_urban_code 0.00032053
## Always_Wear_Mask_Survey
## (Intercept) < 2.22e-16
## region < 2.22e-16
## rural_urban_code < 2.22e-16
## Median_Household_Income_2019 0.68473479
## Percent_Adults_Bachelors_or_Higher < 2.22e-16
## region:rural_urban_code 4.6446e-14
In the above output chunk labeled “Term: region”, we can see region is a significant (p<2.22e-16) multivariate predictor.
In the above output chunk labeled “Term: rural_urban_code”, we can see rural-urban code is a significant (p<2.22e-16) multivariate predictor.
In the above output chunk labeled “Term: Median_Household_Income_2019”, we can see median household income is a significant (p=0.00081009) multivariate predictor.
In the above output chunk labeled “Term: Percent_Adults_Bachelors_or_Higher”, we can see median household income is a significant (p<2.22e-16) multivariate predictor.
In the above output chunk labeled “Term: region:rural_urban_code”, we can see the interaction between rural and rural-urban code is a significant (p<2.22e-16) multivariate predictor.
In the bottom chunk labeled “Type III Sums of Squares”, for each response variables, we can see the type III sum of squares, type III F-tests, and associated p values. From these univariate results, we can see that region is significant for each of the three response variables (Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey). Moreover, rural-urban code is significant for each of the three response variables. Median household income is significant just for Never_Wear_Mask_Survey. Percent of adults with a bachelor’s degree of higher is significant for all three response variables. And the interaction between region and rural-urban code is significant for each of the three response variables.